Method and apparatus for postal address matching

ABSTRACT

Provided are methods and apparatus for matching postal addresses. In an example, provided is a method for comparing postal addresses. The method includes receiving a first postal address, standardizing the form of the first postal address, removing a component of the first postal address to create a canonical representation of the first postal address, and utilizing a signature-based algorithm to identify at least one stored signature which substantially matches the first postal address.

FIELD OF DISCLOSURE

This disclosure relates generally to electronics, and more specifically,but not exclusively, to methods and apparatus which match postaladdresses.

BACKGROUND

Conventional techniques implement brute-force algorithms to compare twopostal addresses and identify matching addresses. The brute-forcealgorithms enumerate data in a postal address and directly check theenumerated data against data in the other postal address. Someconventional techniques require a perfect match between the data in thetwo postal addresses to produce a “match” result. However, theconventional techniques also erroneously reject sufficient“near-matches.” Rejected near-matches can occur when there is a minorerror in one of the postal addresses, such as a misspelling of a name,which prevents the brute-force algorithms from identifying a perfectmatch. Other conventional techniques require a high level ofcomplication to produce high accuracy.

SUMMARY

This summary provides a basic understanding of some aspects of thepresent teachings. This summary is not exhaustive in detail, and isneither intended to identify all critical features, nor intended tolimit the scope of the claims.

In an example, a method for comparing postal addresses is provided. Themethod can be at least a part of an address cleansing process, anaddress validating process, the like, or a practicable combinationthereof. Address cleansing can detect errors in addresses, detect staleaddresses, detect formatting errors, the like, correct these errors, ora practicable combination thereof. The method includes receiving a firstpostal address, forming a token set from at least a portion of the firstpostal address, and creating a postal address signature from the tokenset. The method can also include receiving the first postal address viaa computer network, from a computer, a mobile device, a wearable device,a cloud-based computer network, or a combination thereof. The postaladdress signature is a fuzzy-token based signature. The token set can beformed at least in part from a street name, a city name, or acombination thereof. The postal address signature can be a q-gram basedsignature and “q” is a number equal to or greater than two. The postaladdress signature can be a partition-NED (Normalized Edit Distance)based signature. Edit distance (ED) is a minimum number ofsingle-character edit operations (e.g., insertion, deletion,substitution, and the like) to transform one string to another string.The postal address signature can be a Deletion-Based neighborhoodgeneration algorithm-based signature, an A O(log n) Signature-BasedString Matching Algorithm-based signature, an AdaptJoin algorithm-basedsignature, a VChunk algorithm-based signature, a PassJoinalgorithm-based signature, a FastSS algorithm-based signature, an ExB:Exclusion-based string matching algorithm-based signature, a Part-Enumalgorithm-based signature, or a Partition-ED algorithm-based signature.The method also includes identifying a matching signature by comparingthe postal address signature to at least one stored signature in aplurality of stored signatures, identifying a second postal addresscorresponding to the matching signature, and identifying a match betweenthe first postal address and the second postal address based on thematching signature. The identifying of the match can include comparingthe at least one removed component to a corresponding component of thesecond postal address. The identifying of the match can includecomparing the at least one removed component to a correspondingcomponent of the second postal address. The method can also includeremoving at least one component of the first postal address prior to theforming the token set. The at least one component of the first postaladdress can be a street address, a postal code, or a combinationthereof. The postal code can be a zip code. The method can includestandardizing, prior to creating the token set, a component in the firstpostal address. The standardizing can be to a country-specific standard.The method can also include correcting a misspelling in the first postaladdress prior to creating the postal address signature.

In a further example, provided is a non-transitory computer-readablemedium, including processor-executable instructions stored thereon. Theprocessor-executable instructions are configured to cause a processor toinitiate executing one or more parts of the aforementioned method. Thenon-transitory computer-readable medium can be integrated with acomputing device.

In another example, provided is a first apparatus configured to comparepostal addresses. The first apparatus includes means for receiving afirst postal address, means for forming a token set from at least aportion of the first postal address, and means for creating a postaladdress signature from the token set. The means for receiving the firstpostal address can include means for receiving the first postal addressvia a computer network, from a computer, a mobile device, a wearabledevice, a cloud-based computer network, or a combination thereof. The atleast one component of the first postal address can be a street address,a postal code, or a combination thereof. The postal code can be a zipcode. The first apparatus can also include means for removing at leastone component of the first postal address prior to the forming the tokenset. The first apparatus can also include means for standardizing, priorto creating the token set, a component in the first postal address. Thestandardizing can be to a country-specific standard. The token set canbe formed at least in part from a street name, a city name, or acombination thereof. The first apparatus can also include means forcorrecting a misspelling in the first postal address prior to creatingthe postal address signature. The postal address signature is afuzzy-token based signature. The postal address signature can be aq-gram based signature and “q” is a number equal to or greater than two.The postal address signature can be a partition-NED based signature. Thepostal address signature can be a Deletion-Based neighborhood generationalgorithm-based signature, an A O(log n) Signature-Based String MatchingAlgorithm-based signature, an AdaptJoin algorithm-based signature, aVChunk algorithm-based signature, a PassJoin algorithm-based signature,a FastSS algorithm-based signature, an ExB: Exclusion-based stringmatching algorithm-based signature, a Part-Enum algorithm-basedsignature, or a Partition-ED algorithm-based signature. The firstapparatus also includes means for identifying a matching signature bycomparing the postal address signature to at least one stored signaturein a plurality of stored signatures, means for identifying a secondpostal address corresponding to the matching signature, and means foridentifying a match between the first postal address and the secondpostal address based on the matching signature. The means foridentifying a second postal address can be at least a part of a deviceconfigured to perform an address cleansing process, an addressvalidating process, or a combination thereof. The first apparatus canalso include a computing device, with which the means for identifyingthe second postal address is a constituent part. The means foridentifying the match can further comprises means for comparing the atleast one removed component to a corresponding component of the secondpostal address.

In another example, provided is a second apparatus configured to comparepostal addresses. The second apparatus includes a processor and a memorycoupled to the processor. The memory is configured to cause theprocessor to create specific logic circuits within the processor. Thememory can be configured to cause the processor to initiate creatingspecific logic circuits configured to cause the processor to initiateperforming the identifying the second postal address as a part of anaddress cleansing process, an address validating process, or acombination thereof. The specific logic circuits are configured to causethe processor to initiate receiving a first postal address, initiateforming a token set from at least a portion of the first postal address,and initiate creating a postal address signature from the token set. Thememory can be configured to cause the processor to initiate creatingspecific logic circuits configured to cause the processor to initiatereceiving the first postal address via a computer network, from acomputer, a mobile device, a wearable device, a cloud-based computernetwork, or a combination thereof. The memory can be configured to causethe processor to initiate creating specific logic circuits configured toinitiate causing the processor to form the token set at least in partfrom a street name, a city name, or a combination thereof. The memorycan be configured to cause the processor to initiate creating specificlogic circuits configured to initiate causing the processor to correct amisspelling in the first postal address prior to creating the postaladdress signature. The memory can be configured to cause the processorto initiate creating specific logic circuits configured to cause theprocessor to initiate standardizing, prior to creating the token set, acomponent in the first postal address. The standardizing can be to acountry-specific standard. The postal address signature is a fuzzy-tokenbased signature. The postal address signature can be a q-gram basedsignature and “q” is a number equal to or greater than two. The postaladdress signature can be a partition-NED based signature. The specificlogic circuits are configured to cause the processor to initiateidentifying a matching signature by comparing the postal addresssignature to at least one stored signature in a plurality of storedsignatures, initiate identifying a second postal address correspondingto the matching signature, and initiate identifying a match between thefirst postal address and the second postal address based on the matchingsignature. The memory can be configured to cause the processor toinitiate creating specific logic circuits configured to cause theprocessor to initiate removing at least one component of the firstpostal address prior to the forming the token set. The at least onecomponent of the first postal address can be a street address, a postalcode, or a combination thereof. The postal code can be a zip code. Thememory can be configured to cause the processor to initiate creatingspecific logic circuits configured to cause the processor to initiateperforming the identifying the match by initiating comparing the atleast one removed component to a corresponding component of the secondpostal address. The second apparatus can include a computing device withwhich the processor is integrated. The processor can be amicroprocessor, a microcontroller, a digital signal processor, a fieldprogrammable gate array, a programmable logic device, anapplication-specific integrated circuit, a controller, a non-genericspecial-purpose processor, a state machine, a gated logic device, adiscrete hardware component, a dedicated hardware finite state machine,or a combination thereof.

The foregoing broadly outlines some of the features and technicaladvantages of the present teachings so the detailed description anddrawings can be better understood. Additional features and advantagesare also described in the detailed description. The conception anddisclosed examples can be used as a basis for modifying or designingother devices for carrying out the same purposes of the presentteachings. Such equivalent constructions do not depart from thetechnology of the teachings as set forth in the claims. The inventivefeatures characteristic of the teachings, together with further objectsand advantages, are better understood from the detailed description andthe accompanying drawings. Each of the drawings is provided for thepurpose of illustration and description only, and does not limit thepresent teachings.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are presented to describe examples of thepresent teachings, and are not limiting.

FIGS. 1A-1B depict an example method for comparing postal addresses.

FIGS. 2A-2B depict an example method for preparing a plurality of storedsignatures.

FIG. 3 depicts an example computing device.

FIG. 4 depicts an example network.

In accordance with common practice, the features depicted by thedrawings may not be drawn to scale. Accordingly, the dimensions of thedepicted features may be arbitrarily expanded or reduced for clarity. Inaccordance with common practice, some of the drawings are simplified forclarity. Thus, the drawings may not depict all components of aparticular apparatus or method. Further, like reference numerals denotelike features throughout the specification and figures.

DETAILED DESCRIPTION

Methods and apparatuses for comparing data are provided. Examples of theprovided methods and apparatus can find substantially similar stringpairs from collections of strings. In an example, the strings are twosets of postal addresses. When compared to conventional techniques, theprovided methods and apparatus are more efficient, faster, and cost lessto implement.

The following documents are incorporated by reference into thisdisclosure: “Fast-Join: An Efficient Method for Fuzzy Token Matchingbased Similarity Join” by Jiannan Wang, Guoliang Li, and Jianhua Feng,2011 IEEE International Conference on Data Engineering, pages 458-469(11-16 Apr. 2011) (ISSN: 106306382); and “Extending string similarityjoin to tolerant fuzzy token matching” by Jiannan Wang, Guoliang Li, andJianhua Feng, Association for Computing Machinery Transactions onDatabase Systems, Volume 39, Number 1, Article 7 (January, 2014) (DOI:http://dx.doi.org/10.1145/2535628).

FIGS. 1A-1B depict an example method 100 for comparing postal addresses.The method 100 can implement a fuzzy token similarity algorithm. Fuzzytoken similarity combines token-based similarity and character-basedsimilarity. Fuzzy token similarity computes a fuzzy overlap byconsidering a degree of fuzzy match between tokens. Given two tokensets, fuzzy token similarity uses a character-based similarity of tokenpairs from the two sets by 1.) computing an edit similarity of eachtoken pair from the two sets, and 2.) using maximum weight matching inbipartite graphs to compute fuzzy overlap.

The method 100 can be performed by the apparatus described hereby, suchas a computing device 300 (as depicted in FIG. 3), an electronic device405 (as depicted in FIG. 4), a server 415 (as depicted in FIG. 4), aremote platform 425 (as depicted in FIG. 4), the like, or a combinationthereof. The method 100 can be at least a part of an address cleansingprocess, an address validating process, a data cleansing process, a dataintegration process, or a combination thereof. The method 100 can beimplemented in different countries. For example, the method forcomparing postal addresses 100 can be implemented in countries using aLatin-based character set, a ISO/IEC-compatible character set, aUTF-8-compatible character set, a character-based language, the like, ora combination thereof.

In block 105, a first postal address is received. The first postaladdress can be received from a computer network, from a computer, from amobile device, from a wearable device, from a cloud-based computernetwork, the like, or a combination thereof.

The first postal address can include one or more components such as astreet address, a postal code (for example, a zip code, a postcode, acodes postaux, a post code, an Eircode, a postal routing number, aPostal Index Number, a Codice di Avviamento Postale, a Código deEndereçamento Postal, a Postleitzahl, a postal zone code, a postaldistrict code, the like, or a practicable combination thereof), a postoffice box number, a mailing address, a physical address, the like, or acombination thereof.

In optional block 110, a component in the first postal address isstandardized prior to performing block 115. The standardizing can bedone to meet a country-specific standard, at least a portion of a postalstandard of a country, a postal standard of a region within a country, acommonly-used address format of a country, a commonly-used addressformat of a region within a country, the like, or a practicablecombination thereof. For example, an abbreviation of a term can bereplaced with the corresponding term, such as replacing the abbreviationof the term “street,” which is “St.” with the term “street.” Thestandardization can convert at least a portion of the first postaladdress (i.e., a street name) to all capital letters.

In an example, the first postal address can have a standard format, canbe standardized to a standard format, or both. Non-limiting examples ofpostal address formats follow. In these examples, brackets indicate anoptional term (for example: [Street_Type] is optional). Further, a termwithout brackets is preferable to include, but may not be present in apostal address. In an example, at least a portion of a format of apostal address such as the first postal address can include any of thefollowing:

[Unit_Number] [Street_Number] [Street_Type] Street_Name[Street_Direction] Locality Province_Abbreviation Postal_Code CountryStreet_Number Street_Name [Street_Type] [Street_Direction] [Floor][Unit_Number] Locality State_Abbreviation Postal_Code Country[Unit_Number] [Building_Name] Street_Number [Street_Type] Street_NamePostal_Code Locality Country Street_Name Street_Number Postal_CodeLocality Country [Unit_Number] [Building_Name] [Street Number]Street_Name [Locality] Postal_Code Country PO BOX [Box Number] P.O. BOX[Box Number]

A postal address such as the first postal address can also have a formatconforming to at least a portion of a postal standard of a country, apostal standard of a region within a country, a commonly-used addressformat of a country, a commonly-used address format of a region within acountry the like, or a practicable combination thereof.

In optional block 115, at least one component of the first postaladdress is removed prior to performing block 120. For example, a streetnumber can be removed from the first postal address, to enable creatinga signature for a street, rather than every address on the street. Apostal code can be removed. The removed component can be stored forlater use.

In block 120, a token set is formed from at least a portion of the firstpostal address. The token set can be formed at least in part from astreet name, a city name, or a combination thereof. Including numericalcomponents of the first postal address in the token set is optional.

In optional block 125, a misspelling in the first postal address iscorrected prior to creating a postal address signature. Optional block125 can be performed any time after performing block 105 and beforeperforming block 130 in different examples. In an example, correctingmisspellings are limited to certain address components. If two addressrecords are very similar, but valid, a minor correction can be performedto prevent an error. As an example, street types can be changed. Forexample, given the following addresses:

-   -   1) 10 APPLE STREET MARKHAM ON    -   2) 10 APPLE STRET MARKHAM ON        The misspelling of “stret” is corrected to “street,” and both        addresses are standardized as 10 APPLE ST MARKHAM ON. Thus, the        addresses can be combined, so only one address remains.

However all potential misspellings are not necessarily corrected. As anexample, street names need not be changed. For example, given thefollowing addresses:

-   -   1) 10 APPLE STREET MARKHAM ON    -   2) 10 APPLA STREET MARKHAM ON        It is possible that “Appla” is a correct street name, and is not        misspelled. In this case, the street name is not changed. Thus,        there are two records in this case:    -   1) 10 APPLE STREET MARKHAM ON    -   2) 10 APPLA STREET MARKHAM ON

In block 130, a postal address signature is created from the token set.The postal address signature can be a fuzzy-token based signature. Thepostal address signature can be a q-gram based signature, with “q” beinga number equal to or greater than two. Optionally, the postal addresssignature is a partition-NED based signature. In an example, capitalletters can be replaced with lower-case letters to improve a probabilityof a match. The postal address signature can be created with otheralgorithms, such as a Deletion-Based neighborhood generation algorithm,an A O(log n) Signature-Based String Matching Algorithm, an AdaptJoinalgorithm, a VChunk algorithm, a PassJoin algorithm, a FastSS algorithm,an ExB: Exclusion-based string matching algorithm, a Part-Enumalgorithm, a Partition-ED algorithm, a Partition-NED algorithm, thelike, or a combination thereof. Example implementations can becase-sensitive or case-insensitive. Further, duplicative tokens can beremoved from the postal address signature to shorten the postal addresssignature, reduce a data set size, and reduce processing time.

In block 135, a matching signature is identified by comparing the postaladdress signature to at least one stored signature in a plurality ofstored signatures. The stored signature which is the closest match tothe postal address signature can be identified as the matchingsignature. The plurality of stored signatures can optionally include anumerical component as a token, can include a numerical component asbeing in addition to the tokens of the stored signature, or acombination thereof. The type of stored signatures representing thestored postal address records are algorithm-specific. In the q-gramexample below, the plurality of stored signatures are based on theq-gram algorithm.

In an example, the matching signature is identified based on comparingthe intersecting of the postal address signature to the at least onestored signature in a plurality of stored signatures. If the number ofintersecting tokens is greater than a constant “c,” a match exists. Theconstant “c” can be determined using a Fuzzy-Dice Similaritycalculation, a Fuzzy-Cosine Similarity calculation, a Fuzzy-JaccardSimilarity calculation, the like, or a combination thereof. In examples,the matching signature identification can be case-sensitive orcase-insensitive.

In an example, the comparing includes a filter step and a refining step.In the filter step, candidates of similar pairs are generated based onsignatures. An inverted index can be used to generate candidates. Forexample, given token sets T1, T2, T3 and T4 with:

Sig(T1)={ad, ac, dc},

Sig(T2)={be, cf, em},

Sig(T3)={ad, ab, ac},

Sig(T4)={bm, cf, be}

The inverted list of ad is {T1, T3}. Thus (T1, T3) is a set of candidatematches. As there is no signature whose inverted list contains both T1and T2, thus T1 and T2 are dissimilar and can be pruned. The refiningstep verifies the set of candidate matches to generate the finalresults. Given two token sets T1 and T2, a weighted bigraph isconstructed. As it is expensive to compute the maximum weightedmatching, an upper bound of the maximal weight can be computed byrelaxing the matching condition, such as by allowing the edges in thegraph to share a common vertex. This upper bound is computed by summingthe maximum weights of edges of every token in T1 (or T2). The pair canbe pruned if the upper bound makes F_(δ)(T1, T2) smaller than τ, (T1,T2), where F_(δ) is a token similarity threshold for a specificsignature scheme F, and τ is an overall string similarity threshold (e.g., a minimum acceptable amount of matching).

In block 140, a second postal address corresponding to the matchingsignature is identified. The identification is based on the comparisonresults from block 135.

In block 145, a match between the first postal address and the secondpostal address is identified. The match is identified based on thematching signature. Optionally, identifying the match includes comparingthe at least one component, removed in block 115, to a correspondingcomponent of the second postal address. This optional technique can beperformed using a direct comparison between the first postal address(including the removed component) and a list of second postal addresseshaving both matching signatures and respective corresponding components.

In an example, the matching addresses can be displayed with a videodisplay. The display can depict the matching address on a map, in atable, the like, or a combination thereof.

The method 100 for comparing postal addresses can be implemented in ascalable manner, and multiple instances can be implemented concurrently(with each instance comparing the first postal address against differentpluralities of stored signatures. A dispatching engine route the firstpostal address for processing by a specific instance of the method forcomparing postal addresses 100. The dispatching engine can use an indexcorrelating signatures to the specific instance of the method 100 forcomparing postal addresses.

The foregoing blocks are not limiting of the examples. The blocks can becombined and/or the order can be rearranged, as practicable.

The following describes a simplified example implementation of themethod 100, using a q-gram algorithm. A first postal address of “Markham99 Apple Street” is received and standardized to “99 Apple StreetMarkham.” The numerical portion “99” can be removed and is stored, thusmaking the first postal address “Apple Street Markham.” A token set canbe formed from “Apple Street Markham.” In this example, the token set is{Apple, Street, Markham}. If the first postal address included amisspelling such as “Stret” instead of “Street” then the misspelling canbe detected prior to creating a postal address signature. If a 2-gramset of tokens (i.e., bi-grams) is prepared, the bi-grams for the exampleare:

“Apple”={Ap, pp, pl, le}“Street”={St, tr, re, ee, et}“Markham”={Ma, ar, rk, kh, ha, am}.Thus, if bi-grams are prepared, the postal address signature for theexample is: {Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk, kh, ha,am}. Using the bi-grams provides higher accuracy than using a q-gram setwhere q is greater than or equal to three. Further, if a 3-gram set isprepared, the tri-grams for the example are:“Apple”={App, ppl, ple}“Street”={Str, tre, ree, eet}“Markham”={Mar, ark, rkh, kha, ham}.Thus, if a tri-gram set of tokens is prepared, the postal addresssignature for the example is: {App, ppl, ple, Str, tre, ree, eet, Mar,ark, rkh, kha, ham}. Using the tri-gram set enables faster processingthan using a q-gram set where q is less than three.

An example plurality of stored signatures can include bi-gramsignatures:

104 Pear Street Markham={Pe, ea, ar, St, tr, re, ee, et, Ma, ar, rk, kh,ha, am} with a numerical component of 104.94 Apple Street Markham={Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk,kh, ha, am} with a numerical component of 94.99 Apple Street Markham={Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk,kh, ha, am} with a numerical component of 99.64 Orange Street Markham={Or, ra, an, ng, ge, St, tr, re, ee, et, Ma,ar, rk, kh, ha, am} with a numerical component of 64.65 Orange Street Markham={Or, ra, an, ng, ge, St, tr, re, ee, et, Ma,ar, rk, kh, ha, am} with a numerical component of 65.43 Apple Street Ottawa={Ap, pp, pl, le, St, tr, re, ee, et, Ot, tt, ta,aw, wa} with a numerical component of 43.99 Orange Street Ottawa={Or, ra, an, ng, ge, St, tr, re, ee, et, Ot, tt,ta, aw, wa} with a numerical component of 99.104 Orange Street Ottawa={Or, ra, an, ng, ge, St, tr, re, ee, et, Ot,tt, ta, aw, wa} with a numerical component of 104.

Comparing the postal address signature ({Ap, pp, pl, le, St, tr, re, ee,et, Ma, ar, rk, kh, ha, am}), which has fifteen bi-grams, with storedsignatures of the plurality of stored signatures provides that thestored signature of {Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk, kh,ha, am} is the closest match, thus {Ap, pp, pl, le, St, tr, re, ee, et,Ma, ar, rk, kh, ha, am} is chosen as the matching signature.

The matching signature of {Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar,rk, kh, ha, am} corresponds to a stored address of “Apple StreetMarkham.” Thus, the stored address of “Apple Street Markham” isidentified as the second postal address.

The first postal address of “Apple Street Markham” is compared with thesecond postal address of “Apple Street Markham” to identify a matchbetween the two addresses. Optionally, if a numerical component, such as“99” was removed from the first postal address, then the removedcomponent “99” can be compared to the numerical components of the storedsignatures which are also matching signatures. Thus, the “99” of thefirst postal address can be compared to the “94” of “94 Apple StreetMarkham” and the “99” of “99 Apple Street Markham” to yield a matchbetween the first postal address and the second postal address of “99Apple Street Markham.”

In another example, the removed component “99” can be restored to thefirst postal address prior to using a direct comparison between thefirst postal address (including the removed component) and a list ofsecond postal addresses having both matching signatures and respectivecorresponding components. The resultant “99 Apple Street Markham” isdirectly compared to a list of second postal addresses having bothmatching signatures and respective corresponding numerical components:“94 Apple Street Markham” and “99 Apple Street Markham.” The directcomparison yields a best match of “99 Apple Street Markham.”

FIGS. 2A-2B depict an example method for preparing a plurality of storedsignatures 200. The method for preparing a plurality of storedsignatures 200 can be performed by the apparatus described hereby, suchas a computing device 300 (as depicted in FIG. 3), an electronic device405 (as depicted in FIG. 4), a server 415 (as depicted in FIG. 4), aremote platform 425 (as depicted in FIG. 4), the like, or a combinationthereof. The method for comparing postal addresses 200 can be at least apart of an address cleansing process, an address validating process, ora combination thereof. The method for comparing postal addresses 200 canbe implemented in different countries. For example, the method forcomparing postal addresses 200 can be implemented in countries using aLatin-based character set, a ISO/IEC-compatible character set, aUTF-8-compatible character set, a character-based language, the like, ora combination thereof.

In block 205, one or more postal addresses are received. The one or morepostal addresses can be received via a computer network, from acomputer, from a mobile device, from a wearable device, from acloud-based computer network, the like, or a combination thereof.

The one or more postal addresses can include one or more components suchas a street address, a postal code (for example, a zip code, a postcode,a codes postaux, a post code, an Eircode, a postal routing number, aPostal Index Number, a Codice di Avviamento Postale, a Código deEndereçamento Postal, a Postleitzahl, a postal zone code, a postaldistrict code, the like, or a practicable combination thereof), a postoffice box number, a mailing address, a physical address, the like, or acombination thereof. In an example, the one or more postal addresses canhave a standard format, as described herein. The one or more postaladdresses can also have a format conforming to at least a portion of apostal standard of a country, a postal standard of a region within acountry, a commonly-used address format of a country, a commonly-usedaddress format of a region within a country, the like, or a practicablecombination thereof.

In optional block 210, a component in the one or more postal addressescan be standardized prior to performing block 220. The standardizing canbe done to meet a country-specific standard, at least a portion of apostal standard of a country, a postal standard of a region within acountry, a commonly-used address format of a country, a commonly-usedaddress format of a region within a country the like, or a practicablecombination thereof. The standardizing can be performed to meet astandard format, as described herein.

In optional block 215, at least one component of the one or more postaladdresses is removed prior to performing block 220. For example, anumerical street address can be removed from the one or more postaladdresses. A postal code can be removed. The removed component can bestored for later use. Thus, a signature set need not be generated forevery postal address. One signature set can be generated to cover everypostal address on the same street. Removing a numerical component fromthe first postal address can reduce a volume of data to be compared, andcan reduce a quantity of data in data sets. In an example, if there areten streets, each with 100 houses, then there are 1000 street addresses.Instead of creating 1000 signatures, only 10 signatures are needed—onefor each of the ten streets. Thus, when a query is entered (e.g., themethod 100 is performed), the one query signature is compared to 10signatures, instead of 1000 signatures. This technique canadvantageously reduce a number of tokens, which reduces signature size.This technique also reduces a number of signatures created for thereference data, which uses less memory and increases start-up time.Further, this technique requires less comparisons to a match a query.Implementing the techniques herein can prepare postal addresses forprocessing in a manner that maximizes performance and minimizes memoryrequirements before the postal addresses are processed.

In block 220, a respective token set is formed from at least a portionof each postal address in the one or more postal addresses. Therespective token set can be formed at least in part from a street name,a city name, or a combination thereof. Including numerical components inthe respective token set of each postal address is optional.

In optional block 225, a misspelling in the one or more postal addressesis corrected prior to creating a respective postal address signature.Optional block 225 can be performed any time after performing block 205and before performing block 230. In an example, correcting misspellingsare limited to certain address components. If two address records arevery similar, but valid, a minor correction can be performed to preventan error. As an example, street types can be changed. For example, giventhe following addresses:

-   -   3) 10 APPLE STREET MARKHAM ON    -   4) 10 APPLE STRET MARKHAM ON        The misspelling of “stret” is corrected to “street,” and both        addresses are standardized as 10 APPLE ST MARKHAM ON. Thus, the        addresses can be combined, so only one address remains.

However all potential misspellings are not necessarily corrected. As anexample, street names need not be changed. For example, given thefollowing addresses:

-   -   3) 10 APPLE STREET MARKHAM ON    -   4) 10 APPLA STREET MARKHAM ON        It is possible that “Appla” is a correct street name, and is not        misspelled. In this case, the street name is not changed. Thus,        there are two records in this case:    -   3) 10 APPLE STREET MARKHAM ON    -   4) 10 APPLA STREET MARKHAM ON

In block 230, a respective postal address signature is created, from therespective token set, for each postal address in the one or more postaladdresses. The respective postal address signature can be a fuzzy-tokenbased signature. The respective postal address signature can be a q-grambased signature, with “q” being a number equal to or greater than two.Optionally, the respective postal address signature is a partition-NEDbased signature. The respective postal address signature can be aDeletion-Based neighborhood generation algorithm-based signature, an AO(log n) Signature-Based String Matching Algorithm-based signature, anAdaptJoin algorithm-based signature, a VChunk algorithm-based signature,a PassJoin algorithm-based signature, a FastSS algorithm-basedsignature, an ExB: Exclusion-based string matching algorithm-basedsignature, a Part-Enum algorithm-based signature, a Partition-EDalgorithm-based signature, the like, or a combination thereof.

In an example, capital letters can be replaced with lower-case lettersto improve a probability of a match. Further, duplicative tokens can beremoved from the respective postal address signature to shorten thepostal address signature, reduce a data set size, and reduce processingtime. In other words, once the respective postal address signature iscreated, the respective postal address signature can further be prunedto reduce the respective postal address signature's size and thus impacton memory use and processor performance. There are different ways toprune the respective postal address signature to make the respectivepostal address signature more compact and reduce the computation time.The provided methods and apparatus reduce the size of the respectivepostal address signature as much as possible, while retaining highaccuracy of finding a match.

In an example, tokens in the postal address record can be weighted.Weighting tokens can improve accuracy of a comparison by reducing animpact of commonly occurring tokens on the comparison process. Weightingtokens can increase an impact of rarely occurring tokens on thecomparison process. Further, relatively more important tokens can have ahigher weight, while less important tokens can have a lower weight.

In block 235, at least one respective postal address in the one or morepostal addresses is stored along with each respective postal addresses'respective postal address signature. Optionally, store any respectiveremoved components with the respective postal address signature.

In an example, the one or more postal addresses can be displayed with avideo display. The display can depict the one or more postal addresseson a map, in a table, the like, or a combination thereof.

The foregoing blocks are not limiting of the examples. The blocks can becombined and/or the order can be rearranged, as practicable.

An example implementation of the method 200 follows. In thisnon-limiting example, the following postal addresses are received:

104 Pear Street Markham 94 Apple Street Markham 99 Apple Street Markham64 Orange Street Markham 65 Orange Street Markham 43 Apple Street Ottawa99 Orange Street Ottawa 104 Orange Street Ottawa

If one of the postal addresses received in block 205 is “Markham 104Pear Street” then “Markham 104 Pear Street” can be standardized to “104Pear Street Markham” thus making the postal address “104 Pear StreetMarkham.” The numerical portion “104” can be removed and stored, thusmaking the postal address “Pear Street Markham.” A respective token setcan be formed from “Pear Street Markham.” In this example, therespective token set is {Pear, Street, Markham}. One or more tokens inthe token set can be weighted. If the first postal address included amisspelling such as “Stret” instead of “Street” then the misspelling canbe detected and corrected prior to creating the respective postalsignature. If a 2-gram set of tokens is prepared, the respective tokensignatures for the “Pear Street Markham” example are:

“Pear”={Pe, ea, ar}“Street”={St, tr, re, ee, et}“Markham”={Ma, ar, rk, kh, ha, am}.Thus, if a 2-gram set of respective tokens is prepared, the respectivepostal address signature for “Pear Street Markham” is: {Pe, ea, ar, St,tr, re, ee, et, Ma, ar, rk, kh, ha, am}. The respective postal addresssignatures for the received postal addresses are:104 Pear Street Markham={Pe, ea, ar, St, tr, re, ee, et, Ma, ar, rk, kh,ha, am} with a numerical component of 104.94 Apple Street Markham={Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk,kh, ha, am} with a numerical component of 94.99 Apple Street Markham={Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk,kh, ha, am} with a numerical component of 99.64 Orange Street Markham={Or, ra, an, ng, ge, St, tr, re, ee, et, Ma,ar, rk, kh, ha, am} with a numerical component of 64.65 Orange Street Markham={Or, ra, an, ng, ge, St, tr, re, ee, et, Ma,ar, rk, kh, ha, am} with a numerical component of 65.43 Apple Street Ottawa={Ap, pp, pl, le, St, tr, re, ee, et, Ot, tt, ta,aw, wa} with a numerical component of 43.99 Orange Street Ottawa={Or, ra, an, ng, ge, St, tr, re, ee, et, Ot, tt,ta, aw, wa} with a numerical component of 99.104 Orange Street Ottawa={Or, ra, an, ng, ge, St, tr, re, ee, et, Ot,tt, ta, aw, wa} with a numerical component of 104.

The respective postal address signatures for the one or more postaladdresses can be stored as one address corresponding to one signature.In another example, the respective postal address signatures for the oneor more postal addresses can be combined (for example one signature perstreet) and stored as:

Pear Street Markham={Pe, ea, ar, St, tr, re, ee, et, Ma, ar, rk, kh, ha,am} with a numerical component of 104.Apple Street Markham={Ap, pp, pl, le, St, tr, re, ee, et, Ma, ar, rk,kh, ha, am} with numerical components of 94 and 99.Orange Street Markham={Or, ra, an, ng, ge, St, tr, re, ee, et, Ma, ar,rk, kh, ha, am} with numerical components of 64 and 65.Apple Street Ottawa={Ap, pp, pl, le, St, tr, re, ee, et, Ot, tt, ta, aw,wa} with a numerical component of 43.Orange Street Ottawa={Or, ra, an, ng, ge, St, tr, re, ee, et, Ot, tt,ta, aw, wa} with numerical components of 99 and 104.

FIG. 3 illustrates the example computing device 300 suitable forimplementing examples of the presently disclosed subject matter. Atleast a portion of the methods, sequences, algorithms, steps, or blocksdescribed in connection with the examples disclosed hereby can beembodied directly in hardware, in software executed by a processor (forexample, a processor described hereby), or in a combination of the two.In an example, a processor includes multiple discrete hardwarecomponents. A software module can reside in a storage medium (forexample, a memory device), such as a random-access memory (RAM), a flashmemory, a read-only memory (ROM), an erasable programmable read-onlymemory (EPROM), an electrically erasable programmable read-only memory(EEPROM), a register, a hard disk, a removable disk, a compact discread-only memory (CD-ROM), a storage medium, the like, or a combinationthereof. An example storage medium (for example, a memory device) can becoupled to the processor so the processor can read information from thestorage medium, write information to the storage medium, or both. In anexample, the storage medium can be integral with the processor.

Further, examples provided hereby are described in terms of sequences ofactions to be performed by, for example, one or more elements of acomputing device. The actions described hereby can be performed by aspecific circuit (for example, an application specific integratedcircuit (ASIC)), by program instructions being executed by one or moreprocessors, or by a combination of both. Additionally, a sequence ofactions described hereby can be entirely within any form ofnon-transitory computer-readable storage medium having stored thereby acorresponding set of computer instructions which, upon execution, causean associated processor (such as a special-purpose processor) to performat least a portion of a method, a sequence, an algorithm, a step, or ablock described hereby. Performing at least a part of a functiondescribed hereby can include initiating at least a part of a functiondescribed hereby, at least a part of a method described hereby, thelike, or a combination thereof. In an example, execution of the storedinstructions can transform a processor and any other cooperating devicesinto at least a part of an apparatus described hereby. A non-transitory(that is, a non-transient) machine-readable media specifically excludesa transitory propagating signal. Additionally, a sequence of actionsdescribed hereby can be entirely within any form of non-transitorycomputer-readable storage medium having stored thereby a correspondingset of computer instructions which, upon execution, configure theprocessor to create specific logic circuits (for example, one or moretangible electronic circuits configured to perform a logical operation).Thus, examples may be in a number of different forms, all of which havebeen contemplated to be within the scope of the disclosure.

In an example, when a general-purpose computer (for example, aprocessor) is configured to perform at least a portion of a methoddescribed hereby, then the general-purpose computer becomes aspecial-purpose computer which is not generic and is not ageneral-purpose computer. In an example, loading a general-purposecomputer with special programming can cause the general-purpose computerto be configured to perform at least a portion of a method, a sequence,an algorithm, a step, or a block described in connection with an exampledisclosed hereby. In an example, a combination of two or more relatedmethod steps disclosed hereby can form a sufficient algorithm. Asufficient algorithm can constitute special programming. Specialprogramming can constitute any software which can cause a computer (forexample, a general-purpose computer, a special-purpose computer, etc.)to be configured to perform one or more functions, features, stepsalgorithms, blocks, or a combination thereof, as disclosed hereby.

The computing device 300 can be, for example, a desktop computer, alaptop computer, a mobile device, the like, or a combination thereof.The computing device 300 can include a processor 305, a bus 310, amemory 315 (such as random-access memory (RAM), read-only memory (ROM),flash RAM, the like, or a combination thereof), a video display 320(such as a display screen), a user input interface 325 (which caninclude one or more controllers and associated user input devices suchas a keyboard, mouse, touch screen, the like, or a combination thereof),a fixed storage device 330 (such as a hard drive, flash storage, thelike, or a combination thereof), a removable media device 335 (operativeto control and receive an optical disk, flash drive, the like, or acombination thereof), a network interface 340 operable to communicatewith one or more remote devices via a suitable network connection, or acombination thereof. Examples of the disclosed subject matter can beimplemented in, and used with, different component and networkarchitectures.

The processor 305 is configured to control operation of the user device300, including performing at least a part of a method described hereby.The processor 305 can perform logical and arithmetic operations based onprocessor-executable instructions stored within the memory 315. Theprocessor 305 can execute instructions stored in the memory 315 toimplement at least a part of a method described herein (for example, theprocessing illustrated in FIGS. 1A-2B). The instructions, when executedby the processor 305, can transform the processor 305 into aspecial-purpose processor that causes the processor to perform at leasta part of a function described hereby.

The processor 305 can comprise or be a component of a processing systemimplemented with one or more processors. The one or more processors canbe implemented with a microprocessor, a microcontroller, a digitalsignal processor, a field programmable gate array (FPGA), a programmablelogic device (PLD), an application-specific integrated circuit (ASIC), acontroller, a state machine, gated logic, a discrete hardware component,a dedicated hardware finite state machine, any other suitable entitythat can at least one of manipulate information (for example,calculating, logical operations, the like, or a combination thereof),control another device, the like, or a combination thereof. Theprocessor 305 may also be referred to as a central processing unit(CPU), a special-purpose processor, or both.

The bus 310 interconnects components of the computing device 300. Thebus 310 can enable information communication between the processor 305and one or more components coupled to the processor 305. The bus system310 can include a data bus, a power bus, a control signal bus, a statussignal bus, the like, or a combination thereof. The components of thecomputing device 300 can be coupled together to communicate with eachother using a different suitable mechanism.

The memory 315, can include at least one of read-only memory (ROM),random access memory (RAM), a flash memory, an erasable programmableread-only memory (EPROM), an electrically erasable programmableread-only memory (EEPROM), a register, other memory, the like, or acombination thereof stores information (for example, data, instructions,software, the like, or a combination thereof) and is configured toprovide the information to the processor 305. The RAM can be a mainmemory configured to store an operating system, an application program,the like, or a combination thereof. The ROM (for example, a flashmemory) can be configured to store a basic input-output system (BIOS)which can control basic hardware operation such as the processor's 305interaction with peripheral components. The memory 310 can also includea non-transitory machine-readable media configured to store software.Software can mean any type of instructions, whether referred to as atleast one of software, firmware, middleware, microcode, hardwaredescription language, the like, or a combination thereof. Instructionscan include code (for example, in source code format, in binary codeformat, executable code format, or in any other suitable code format).

The video display 320 can include a component configured to visuallyconvey information to a user of the computing device 300.

The user input interface 325 can include a keypad, a microphone, aspeaker, a display, the like, or a combination thereof. The user inputinterface 325 can include a component configured to convey informationto a user of the computing device 300, receive information from the userof the computing device 300, or both.

The fixed storage device 330 can be integral with the computing device300 or can be separate and accessed through other interfaces. The fixedstorage device 330 can be an information storage device which is notconfigured to be removed during use, such as a hard disk drive.

The removable media device 335 can be integral with the computing device300 or can be separate and accessed through other interfaces. Theremovable media device 335 can be an information storage device which isconfigured to be removed during use, such as a memory card, a jumpdrive, flash memory, the like, or a combination thereof. Code toimplement the present disclosure can be stored in computer-readablestorage media such as one or more of the memory 315, the fixed storagedevice 330, the removable media device 335, a remote storage location,the like, or a combination thereof.

The network interface 340 can electrically couple the computing device300 to a network and enable exchange of information between thecomputing device 300 and the network. The network, in turn, can couplethe computing device 300 to another electronic device, such as a remoteserver, a remote storage medium, the like, or a combination thereof. Thenetwork can enable exchange of information between the computing device300 and the electronic device.

The network interface 340 can provide a connection via a wiredconnection, a wireless connection, or a combination thereof. The networkinterface 340 can provide such connection using any suitable techniqueand protocol as is readily understood by one of skill in the art.Example techniques and protocols include digital cellular telephone,Wi-Fi™, Bluetooth®, near-field communications (NFC), the like, andpracticable combinations thereof. For example, the network interface 340can enable the computing device 300 to communicate with other computersvia one or more local, wide-area, or other communication networks. Otherdevices or components (not shown in FIG. 3) (for example, documentscanners, digital cameras, and the like) can be coupled via the networkinterface 340.

All of the components illustrated in FIG. 3 need not be present topractice the present disclosure. Further, the components can be coupledin different ways from that illustrated.

FIG. 4 depicts an example network 400 suitable for implementing examplesof the presently disclosed subject matter. The network 400 includes theelectronic device 405. The electronic device 405 can include thecomputing device 200, a local computer, a smart phone, a mobile device,a tablet computer, an electronic device described hereby (as ispracticable), the like, or a combination thereof. The electronic device405 is electrically coupled to a network 410.

The network 410 can be a private network, a local network, a wide-areanetwork, the Internet, any suitable communication network, the like, ora combination thereof. The network 410 can be implemented on anysuitable platform including a wired network, a wireless network, anoptical network, the like, or a combination thereof.

The network 410 can enable the electronic device 405 to communicate (forexample, access) with one or more remote devices, such as the server415, a database 420, the like, or a combination thereof. In a furtherexample, a remote device can be configured to provide intermediaryaccess, such as where the server 415 is configured to provide access toresources stored in the database 420. The network 410 can enable theelectronic device 405 to communicate (for example, access) with theremote platform 425. For example, the remote platform 425 can be a cloudcomputing arrangement, a search engine, a content delivery system, thelike, or a combination thereof. The remote platform 425 can include theserver 415, the database 420, the like, or a combination thereof.

All of the components illustrated in FIG. 4 need not be present topractice the present disclosure. Further, the components can be coupledin different ways from that illustrated.

As used hereby, the term “example” means “serving as an example,instance, or illustration.” Any example described as an “example” is notnecessarily to be construed as preferred or advantageous over otherexamples. Likewise, the term “examples” does not require all examplesinclude the discussed feature, advantage, or mode of operation. Use ofthe terms “in one example,” “an example,” “in one feature,” and/or “afeature” in this specification does not necessarily refer to the samefeature and/or example. Furthermore, a particular feature and/orstructure can be combined with one or more other features and/orstructures. Moreover, at least a portion of the apparatus describedhereby can be configured to perform at least a portion of a methoddescribed hereby.

It should be noted the terms “connected,” “coupled,” and any variantthereof, mean any connection or coupling between elements, either director indirect, and can encompass a presence of an intermediate elementbetween two elements which are “connected” or “coupled” together via theintermediate element. Coupling and connection between the elements canbe physical, logical, or a combination thereof. Elements can be“connected” or “coupled” together, for example, by using one or morewires, cables, printed electrical connections, electromagnetic energy,the like, or a combination thereof. The electromagnetic energy can havea wavelength at a radio frequency, a microwave frequency, a visibleoptical frequency, an invisible optical frequency, the like, or apracticable combination thereof. These are several non-limiting andnon-exhaustive examples.

The term “signal” can include any signal such as a data signal, an audiosignal, a video signal, a multimedia signal, an analog signal, a digitalsignal, the like, or a practicable combination thereof. Information andsignals described hereby can be represented using any of a variety ofdifferent technologies and techniques. For example, data, aninstruction, a process step, a process block, a command, information, asignal, a bit, a symbol, the like, or a practicable combination thereof,which are referred to hereby can be represented by a voltage, a current,an electromagnetic wave, a magnetic field, a magnetic particle, anoptical field, an optical particle, the like, or a practicablecombination thereof, depending at least in part on the particularapplication, at least in part on a design, at least in part on acorresponding technology, at least in part on like factors, or apracticable combination thereof.

A reference using a designation such as “first,” “second,” and so forthdoes not limit either the quantity or the order of those elements.Rather, these designations are used as a convenient method ofdistinguishing between two or more elements or instances of an element.A reference to first and second elements does not mean only two elementscan be employed. A reference to first and second elements does not meanthe first element must necessarily precede the second element. Also,unless stated otherwise, a set of elements can comprise one or moreelements. In addition, terminology of the form “at least one of: X, Y,or Z” or “one or more of X, Y, or Z,” or “at least one of the groupconsisting of X, Y, and Z” can be interpreted as “X or Y or Z or anycombination of these elements.” For example, this terminology caninclude X, or Y, or Z, or X and Y, or X and Z, or X and Y and Z, or 2X,or 2Y, or 2Z, and so on.

The terminology used hereby is for the purpose of describing particularexamples and is not intended to be limiting. The singular forms “a,”“an,” and “the” include the plural forms as well, unless the contextclearly indicates otherwise. In other words, the singular can portendthe plural, where practicable. The terms “comprises,” “comprising,”“includes,” and “including,” specify a presence of a feature, aninteger, a step, a block, an operation, an element, a component, thelike, or a combination thereof. The terms “comprises,” “comprising,”“includes,” and “including,” do not necessarily preclude a presence oran addition of another feature, integer, step, block, operation,element, component, and the like.

In examples, an apparatus disclosed hereby can be at least a part of anelectronic device, coupled to an electronic device, or a combinationthereof, where the electronic device can be, but is not limited to, amobile device, a navigation device (for example, a global positioningsystem receiver, a global navigation satellite system receiver, thelike, or a combination thereof), a wireless device, a computer, thelike, or a combination thereof.

The term “mobile device” can describe, and is not limited to: a mobilephone, a mobile communication device, a mobile hand-held computer, aportable computer, a tablet computer, a wireless device, a wirelessmodem, a portable tele-transaction computer (PTC), a data processingdevice located (e.g., mounted) in a vehicle, the like, or a combinationthereof.

Those of skill in the art will appreciate the example functions,methods, logical blocks, modules, circuits, and steps described in theexamples disclosed hereby can be implemented as electronic hardware,computer software, or combinations of both, as is practicable. Toillustrate this interchangeability of hardware and software, examplefunctions, methods, logical blocks, modules, circuits, and steps havebeen described hereby generally in terms of their functionality. Whethersuch functionality is implemented as hardware or software depends upon aparticular application and design constraints imposed on an overallsystem. Skilled artisans can implement the described functionality indifferent ways for each particular application, but such implementationdecisions should not be interpreted as causing a departure from thescope of the present disclosure.

Nothing stated or depicted in this application is intended to dedicateany component, step, block, feature, object, benefit, advantage, orequivalent to the public, regardless of whether the component, step,block, feature, object, benefit, advantage, or the equivalent is recitedin the claims. Additionally, conventional elements of the currentteachings may not be described in detail, or may be omitted, to avoidobscuring aspects of the current teachings. While this disclosuredescribes examples, changes and modifications can be made to theexamples disclosed hereby without departing from the scope defined bythe appended claims. The present disclosure is not intended to belimited to the specifically disclosed examples alone.

What is claimed is:
 1. A method for comparing postal addresses,comprising: receiving a first postal address; forming a token set fromat least a portion of the first postal address; creating a postaladdress signature from the token set, wherein the postal addresssignature is a fuzzy-token based signature; identifying a matchingsignature by comparing the postal address signature to at least onestored signature in a plurality of stored signatures; identifying asecond postal address corresponding to the matching signature; andidentifying a match between the first postal address and the secondpostal address based on the matching signature.
 2. The method of claim1, further comprising removing at least one component of the firstpostal address prior to the forming the token set.
 3. The method ofclaim 2, wherein the at least one component of the first postal addressis a street address, a postal code, or a combination thereof.
 4. Themethod of claim 2, wherein the postal code is one or more of a zip code,a postcode, a codes postaux, a post code, an Eircode, a postal routingnumber, a Postal Index Number, a Codice di Avviamento Postale, a Códigode Endereçamento Postal, a Postleitzahl, a postal zone code, a postaldistrict code, or a combination thereof.
 5. The method of claim 2,wherein the identifying the match further comprises comparing the atleast one removed component to a corresponding component of the secondpostal address.
 6. The method of claim 1, wherein the token set isformed at least in part from a street name, a city name, or acombination thereof.
 7. The method of claim 1, further comprisingcorrecting a misspelling in the first postal address prior to creatingthe postal address signature.
 8. The method of claim 1, furthercomprising standardizing, prior to creating the token set, a componentin the first postal address.
 9. The method of claim 8, wherein thestandardizing is done to a country-specific standard.
 10. The method ofclaim 1, wherein the postal address signature is a q-gram basedsignature and “q” is a number equal to two or greater than two, aDeletion-Based neighborhood generation algorithm-based signature, an AO(log n) Signature-Based String Matching Algorithm-based signature, anAdaptJoin algorithm-based signature, a VChunk algorithm-based signature,a PassJoin algorithm-based signature, a FastSS algorithm-basedsignature, an ExB: Exclusion-based string matching algorithm-basedsignature, a Part-Enum algorithm-based signature, a Partition-EDalgorithm-based signature, or a Partition-NED algorithm-based signature.11. The method of claim 1, further comprising receiving the first postaladdress via a computer network, from a computer, a mobile device, awearable device, a cloud-based computer network, or a combinationthereof.
 12. The method of claim 1, wherein the method is at least apart of an address cleansing process, an address validating process, adata cleansing process, a data integration process, or a combinationthereof.
 13. The method of claim 1, further comprising weighting one ormore tokens in the token set.
 14. An apparatus configured to comparingpostal addresses, comprising: a processor; and a memory coupled to theprocessor and configured to cause the processor to create specific logiccircuits within the processor, wherein the specific logic circuits areconfigured to cause the processor to: initiate receiving a first postaladdress; initiate forming a token set from at least a portion of thefirst postal address; initiate creating a postal address signature fromthe token set, wherein the postal address signature is a fuzzy-tokenbased signature; initiate identifying a matching signature by comparingthe postal address signature to at least one stored signature in aplurality of stored signatures; initiate identifying a second postaladdress corresponding to the matching signature; and initiateidentifying a match between the first postal address and the secondpostal address based on the matching signature.
 15. The apparatus ofclaim 14, wherein the memory is configured to cause the processor toinitiate creating specific logic circuits configured to cause theprocessor to initiate removing at least one component of the firstpostal address prior to the forming the token set.
 16. The apparatus ofclaim 14, further comprising a computing device with which the processoris integrated.
 17. The apparatus of claim 14, wherein the processor is amicroprocessor, a microcontroller, a digital signal processor, a fieldprogrammable gate array, a programmable logic device, anapplication-specific integrated circuit, a controller, a non-genericspecial-purpose processor, a state machine, a gated logic device, adiscrete hardware component, a dedicated hardware finite state machine,or a combination thereof.
 18. A non-transitory computer-readable medium,comprising: processor-executable instructions stored thereon andconfigured to cause a processor to: initiate receiving a first postaladdress; initiate forming a token set from at least a portion of thefirst postal address; initiate creating a postal address signature fromthe token set, wherein the postal address signature is a fuzzy-tokenbased signature; initiate identifying a matching signature by comparingthe postal address signature to at least one stored signature in aplurality of stored signatures; initiate identifying a second postaladdress corresponding to the matching signature; and initiateidentifying a match between the first postal address and the secondpostal address based on the matching signature.
 19. The non-transitorycomputer-readable medium of claim 18, wherein the processor-executableinstructions further include instructions configured to initiate causingthe processor to remove at least one component of the first postaladdress prior to the forming the token set.
 20. The non-transitorycomputer-readable medium of claim 18, wherein the at least one componentof the first postal address is a street address, a postal code, or acombination thereof.